Unique Generalized Abbreviations (hard)

Problem Statement #

Given a word, write a function to generate all of its unique generalized abbreviations.

Generalized abbreviation of a word can be generated by replacing each substring of the word by the count of characters in the substring. Take the example of “ab” which has four substrings: “”, “a”, “b”, and “ab”. After replacing these substrings in the actual word by the count of characters we get all the generalized abbreviations: “ab”, “1b”, “a1”, and “2”.

Example 1:

Input: "BAT"
Output: "BAT", "BA1", "B1T", "B2", "1AT", "1A1", "2T", "3"

Example 2:

Input: "code"
Output: "code", "cod1", "co1e", "co2", "c1de", "c1d1", "c2e", "c3", "1ode", "1od1", "1o1e", "1o2", 
"2de", "2d1", "3e", "4"

Try it yourself #

Try solving this question here:

Output

2.310s

Generalized abbreviation are: [] Generalized abbreviation are: []

Solution #

This problem follows the Subsets pattern and can be mapped to Balanced Parentheses. We can follow a similar BFS approach.

Let’s take Example-1 mentioned above to generate all unique generalized abbreviations. Following a BFS approach, we will abbreviate one character at a time. At each step we have two options:

  • Abbreviate the current character, or
  • Add the current character to the output and skip abbreviation.

Following these two rules, let’s abbreviate BAT:

  1. Start with an empty word: “”
  2. At every step, we will take all the combinations from the previous step and apply the two abbreviation rules to the next character.
  3. Take the empty word from the previous step and add the first character to it. We can either abbreviate the character or add it (by skipping abbreviation). This gives us two new words: _, B.
  4. In the next iteration, let’s add the second character. Applying the two rules on _ will give us _ _ and 1A. Applying the above rules to the other combination B gives us B_ and BA.
  5. The next iteration will give us: _ _ _, 2T, 1A_, 1AT, B _ _, B1T, BA_, BAT
  6. The final iteration will give us:3, 2T, 1A1, 1AT, B2, B1T, BA1, BAT

Here is the visual representation of this algorithm:

Created with Fabric.js 1.6.0-rc.1 Given word: BAT Add 3rd character: T Add 2nd character: A Add 1st character: B _ B _ _ 1A B_ BA _ _ _ 2T 1A_ 1AT B_ _ B1T BA_ BAT 3 2T 1A1 1AT B2 B1T BA1 BAT

Code #

Here is what our algorithm will look like:

Output

2.288s

Generalized abbreviation are: [3, 2T, 1A1, 1AT, B2, B1T, BA1, BAT] Generalized abbreviation are: [4, 3e, 2d1, 2de, 1o2, 1o1e, 1od1, 1ode, c3, c2e, c1d1, c1de, co2, co1e, cod1, code]

Time complexity #

Since we had two options for each character, we will have a maximum of 2N2^N combinations. If you see the visual representation of Example-1 closely you will realize that it is equivalent to a binary tree, where each node has two children. This means that we will have 2N2^N leaf nodes and 2N12^N-1 intermediate nodes, so the total number of elements pushed to the queue will be 2N2^N + 2N1,2^N-1, which is asymptotically equivalent to O(2N)O(2^N). While processing each element, we do need to concatenate the current string with a character. This operation will take O(N)O(N), so the overall time complexity of our algorithm will be O(N2N)O(N*2^N).

Space complexity #

All the additional space used by our algorithm is for the output list. Since we can’t have more than O(2N)O(2^N) combinations, the space complexity of our algorithm is O(N2N)O(N*2^N).

Recursive Solution #

Here is the recursive algorithm following a similar approach:

Output

2.715s

Generalized abbreviation are: [3, 2T, 1A1, 1AT, B2, B1T, BA1, BAT] Generalized abbreviation are: [4, 3e, 2d1, 2de, 1o2, 1o1e, 1od1, 1ode, c3, c2e, c1d1, c1de, co2, co1e, cod1, code]

Mark as Completed
←    Back
Balanced Parentheses (hard)
Next    →
Problem Challenge 1